Code
library(tidyverse)
library(here)
library(lubridate)
library(tsibble)
library(feasts)
library(fable)
library(kableExtra)
library(patchwork)
library(skimr)The goal of this analysis is to visualize time series data of fish passage at the Willamette Falls fish ladder to identify trends, seasonal patterns, and annual variations in fish passage. Additionally, the analysis includes forecasting using the Holt-Winters method to predict future trends in salmon runs, providing critical information for watershed management and fish recovery efforts.
Fish passage at Willamette Falls is critical for the survival of Oregon’s wild salmon and steelhead, which are on the brink of extinction due to dams blocking access to spawning habitats and degrading river ecosystems. With populations at just 1–2% of their historic levels, this analysis helps uncover migration trends and forecast future passage, providing valuable insights to guide recovery efforts, improve watershed management, and ensure these iconic species are protected for future generations.
This project explores adult fish passage data recorded at the Willamette Falls fish ladder on the Willamett River, Oregon from 2001 to 2010. The dataset provides valuable insights into the migration patterns of five salmon species: Chinook, Jack Chinook, Steelhead, Coho, and Jack Coho.
Data Citation: U.S. Army Corps of Engineers, NWD, et al. DART Adult Passage Counts Daily for All Species. Data accessed via Columbia River DART (Data Access in Real Time) on January 25, 2023.
The following libraries will be used for data manipulation, visualization, and building regression models.
library(tidyverse)
library(here)
library(lubridate)
library(tsibble)
library(feasts)
library(fable)
library(kableExtra)
library(patchwork)
library(skimr)Import data, replace NA values with zero, convert Date column from character to data and convert df to tsibble using the as_tsibble() function.
# Import data
fish_df <- read_csv(here('data', 'willamette_fish_passage.csv')) %>%
#replace(is.na(.),0) %>%
janitor::clean_names()
# Convert to ts
fish_ts <- fish_df %>%
mutate(date = mdy(date)) %>%
as_tsibble(key = NULL,
index = date) %>%
replace(is.na(.), 0)The original dataset contains multiple empty columns with over 3,000 missing values. Therefore, this study focuses on three salmon species—Coho, Jack Coho, and Steelhead salmon—for further analysis.
skim(fish_df)| Name | fish_df |
| Number of rows | 3652 |
| Number of columns | 16 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| logical | 8 |
| numeric | 6 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| project | 0 | 1 | 16 | 16 | 0 | 1 | 0 |
| date | 0 | 1 | 6 | 8 | 0 | 3652 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| chinook_run | 3652 | 0 | NaN | : |
| wild_steelhead | 3652 | 0 | NaN | : |
| sockeye | 3652 | 0 | NaN | : |
| shad | 3652 | 0 | NaN | : |
| lamprey | 3652 | 0 | NaN | : |
| bull_trout | 3652 | 0 | NaN | : |
| chum | 3652 | 0 | NaN | : |
| pink | 3652 | 0 | NaN | : |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| chinook | 1543 | 0.58 | 250.48 | 476.52 | 1.0 | 7.00 | 35.0 | 236.00 | 3883.0 | ▇▁▁▁▁ |
| jack_chinook | 2433 | 0.33 | 11.22 | 16.37 | 1.0 | 2.00 | 6.0 | 14.00 | 170.0 | ▇▁▁▁▁ |
| steelhead | 203 | 0.94 | 85.97 | 115.68 | -2.0 | 8.00 | 28.0 | 127.00 | 709.0 | ▇▂▁▁▁ |
| coho | 2694 | 0.26 | 76.03 | 160.84 | -2.0 | 3.25 | 15.0 | 57.75 | 1290.0 | ▇▁▁▁▁ |
| jack_coho | 2971 | 0.19 | 21.03 | 37.02 | 1.0 | 2.00 | 7.0 | 22.00 | 380.0 | ▇▁▁▁▁ |
| temp_c | 1382 | 0.62 | 12.94 | 6.04 | 0.6 | 7.80 | 11.7 | 17.80 | 26.7 | ▂▇▆▃▃ |
To better understand the structure of time series data, it’s useful to decompose to separately explore contributions of the different components (trend, seasonality, etc). This study employs Seasonal and Trend Decomposition using LOESS (Locally Estimated Scatterplot Smoothing), a method for estimating nonlinear relationships. It applies a weighted moving average across all points in the dataset, with weights determined by their distance from the point being averaged. STL() function from the feasts is used to obtain decomposition.
# Prepare data for plotting
fish_long <- fish_ts %>%
select('date', 'steelhead','coho', 'jack_coho') %>%
pivot_longer(
cols = -date,
names_to = "species",
values_to = "counts"
)
# Obtain decomposition
jack_dcmp <- fish_long %>%
filter(species =='jack_coho') %>%
model(STL(counts~season(period = "1 year") + trend(window = 25)))
# Visualize components
components(jack_dcmp) %>%
autoplot()+
theme_classic()# Obtain decomposition
coho_dcmp <- fish_long %>%
filter(species =='coho') %>%
model(STL(counts~season(period = "1 year") + trend(window = 25)))
# Visualize components
components(coho_dcmp) %>%
autoplot()+
theme_classic()# Obtain decomposition
jack_dcmp <- fish_long %>%
filter(species =='steelhead') %>%
model(STL(counts~season(period = "1 year") + trend(window = 25)))
# Visualize components
components(jack_dcmp) %>%
autoplot()+
theme_classic()# Create plot
ggplot(fish_long, aes(x=date, y=counts, color=species)) +
geom_line() +
labs(title = "Time Series Plots of Salmon Adult Passage",
x = "Time",
y = "Counts") +
scale_color_brewer(palette = "Set2")+
theme_minimal()+
facet_wrap(~species,ncol=1)+
theme(legend.position='none')From the time series plots in Figure 1, we can observe the following observations:
A seasonplot can help point out seasonal patterns, and help to glean insights over the years. We’ll use feasts::gg_season() to create an exploratory seasonplot, which has month on the x-axis, salmon counts on the y-axis, and each year is its own series (mapped by line color).
fish_long %>%
gg_season(y=counts, pal=hcl.colors(n=10)) +
theme_light()+
labs(title='Seasonplots of Daily Counts for Salmon Passage by Species',
x="month",
y="Species Counts")Figure 2 displays the seasonal variation in the counts of salmon species across all years, with each line representing a specific year and highlighting recurring monthly patterns. Since the daily counts were too stochastic and busy, especially for steelhead salmon, I opted to use monthly means for clearer interpretation.
Seasonplots with monthly means:
fish_month <- fish_long %>%
index_by(yr_mo = ~yearmonth(.)) %>%
group_by(species) %>%
summarize(month_mean = mean(counts, na.rm=TRUE)) %>%
ungroup()
#fish_month
fish_month %>%
gg_season(y=month_mean, pal=hcl.colors(n=10)) +
theme_light()+
labs(title='Seasonplots of Monthly Average Counts for Salmon Passage by Species',
x="month",
y="Species Counts")Figure 2 and 3 shows the monthly variation in counts for three salmon species across multiple years:
# Prep data for plotting
fish_yr <- fish_long %>%
index_by(year = ~year(.)) %>%
group_by(species) %>%
summarize(annual_total = sum(counts)) %>%
ungroup()
# Convert year column to factor
fish_yr$year = as.factor(fish_yr$year)
fish_yr %>%
ggplot(aes(x = year, y = annual_total, color = species, group = species)) + # Explicitly group by species
geom_line() +
theme_minimal() +
labs(
title = "Annual Total Counts of Fish Passage by Species",
x = "Year",
y = "Counts"
) +
scale_color_brewer(palette = 'Set2')We use the ETS() function from the fable package to generate predictions. Exponential smoothing calculates weighted averages of past observations, with the weights decreasing exponentially as the observations become older. Given the change in variance over time, we specify multiplicative seasonality, and set restrict as FALSE to allow function to find best model.
The results show negative counts for Steelhead and Jack Coho salmon, which is unrealistic and indicates that this forecasting method is not suitable for salmon data.
salmon_ts <- fish_ts %>% select(date, coho, jack_coho, steelhead)
# Reshape the data into a long format
salmon_long <- salmon_ts %>%
pivot_longer(cols = c(coho, jack_coho, steelhead),
names_to = "species",
values_to = "counts")
# Fit the ETS model for each species
ets_models <- salmon_long %>%
model(ets = ETS(counts~season("M"), restrict=FALSE))
# Generate forecasts for each species
forecasts <- ets_models %>%
forecast(h = "5 years")
# Plot the forecasts along with historical data
forecasts %>%
autoplot(salmon_long, level = NULL) +
labs(
title = "Historical and Forecasted Salmon Passage by Species",
y = "Counts",
x = "Date"
) +
theme_minimal()kable(ets_models, col.names=c("Species","ETS Model(Error, Trend, Seasonality)"), caption = "ETS Models by Species")| Species | ETS Model(Error, Trend, Seasonality) |
|---|---|
| coho | |
| jack_coho | <ETS(A,N,M)> |
| steelhead | <ETS(A,N,M)> |
The differences in the ETS models arise because the ETS() function automatically selects the best model based on the data for each species. For Jack Coho and Steelhead, the data likely exhibited clear seasonal patterns with varying magnitudes, leading to the selection of a model with multiplicative seasonality. In contrast, Coho data may not show significant seasonal or trend components, so no model were selected.
Forecast results suggest that forcing a seasonal model may not be appropriate. Further exploration is needed to find the correct model for forecasting. Here we use a autocorrelation function to confirm our interpretation.
Autocorrelation function (ACF) shows how correlated observations within a series are with pervious observations on the same variable. In the context of salmon passage, the ACF helps identify patterns in the monthly average counts of steelhead salmon passing through the ladder over time. This analysis is useful for guiding selection of models to forecast future values in a time series.
For Steelhead salmon, I do not expect exponential smoothing (ETS) to be the most effective forecasting method due to its broader and more variable seasonal pattern and significant interannual variability.
# autocorrelation for steelhead
fish_month %>%
filter(species == 'steelhead') %>%
ACF(month_mean) %>%
autoplot() +
theme_minimal()+
labs(
title="ACF plot for Steelhead Salmon Passage",
x="lag (in month)",
y="autocorrelation coefficient"
)For Coho salmon, I do not expect exponential smoothing (ETS) to be the most effective forecasting method due to its broader and more variable seasonal pattern and significant interannual variability.
# autocorrelation for steelhead
fish_month %>%
filter(species == 'coho') %>%
ACF(month_mean) %>%
autoplot() +
theme_minimal()+
labs(
title="ACF plot for Steelhead Salmon Passage",
x="lag (in month)",
y="autocorrelation coefficient"
)# autocorrelation for steelhead
fish_month %>%
filter(species == 'jack_coho') %>%
ACF(month_mean) %>%
autoplot() +
theme_minimal()+
labs(
title="ACF plot for Steelhead Salmon Passage",
x="lag (in month)",
y="autocorrelation coefficient"
)For Jack Coho salmon, the highest correlation coefficient occurs at lags of 12 months, indicating seasonality. This suggests that exponential smoothing could be a suitable forecasting method.